Parallel computing method for total energy and energy gradient of non experience molecular-orbital method

ABSTRACT

A parallel computing method by using a parallel computer having a plurality of processors is provided, wherein when a 2-electron integration is transformed from an atomic-orbital base (rs|tu) to a molecular-orbital base (ab|cd), indexes r and s of an atomic orbital to be calculated are designated for the plurality of processors and each of the plurality of processors performs processing for all combinations of designated indexes R and S, and indexes t and u of the atomic orbital to be calculated. Subsequently, the grain sizes are equalized, many commodity processors can be connected with high performance, and the cost of a high-speed operation computer reduces. Further, the computer can obtain a large main-memory area, as a whole, through the use of inexpensive local memories.

TECHNICAL FIELD

The present invention relates to a parallel computing method for totalenergy and energy gradient by using the multi-configurationself-consistent field (MCSCF) method and the configuration interaction(CI) method of non-experience molecular-orbital method.

BACKGROUND ART

Of the non-experience molecular orbital method, the above-describedMCSCF method and CI method have been used as typical computing methodsthat can use electron correlation. According to the MCSCF method, totalenergy E and the derivative thereof with respect to an atomic-nucleuscoordinate, that is, the energy gradient (a force applied on the atomicnucleus) are given as below.

$\begin{matrix}{E = {{\sum\limits_{ab}^{M\; O}{\gamma_{ab}h_{ab}}} + {\frac{1}{2}{\sum\limits_{abcd}^{M\; O}{\Gamma_{abcd}( {{ab}❘{c\; d}} )}}}}} & (1) \\{\frac{\partial E}{\partial q} = {{\sum\limits_{ab}^{M\; O}{\gamma_{ab}\frac{\partial h_{ab}}{\partial q}}} + {\frac{1}{2}{\sum\limits_{abcd}^{M\; O}{\Gamma_{abcd}\frac{\partial( {{ab}❘{c\; d}} )}{\partial q}}}}}} & (2)\end{matrix}$Here, MO indicates a molecular orbital and q indicates any of the nucleix, y, and z of an atom forming a molecule. γ and Γ indicate functions ofa coefficient C of an electron configuration obtained through a solutionthat will be described later. h_(ab) and (ab|cd) are a 1-electronintegration and a 2-electron integration at a molecular-orbital base andare obtained through h_(rs) and (rs|tu) at an atomic orbital base. Theh_(rs) and (rs|tu) are defined as below.

$\begin{matrix}{{h_{rs} = {\int_{- \infty}^{\infty}{\int_{- \infty}^{\infty}{\int_{- \infty}^{\infty}{{x_{r}( r_{1} )}h\;{x_{s}( r_{1} )}{\mathbb{d}r_{1}}}}}}},{h = {{- \frac{\hslash}{2}}{\nabla^{2}{- {\sum\limits_{i = 1}^{{Nuc}.}{\frac{Z_{i}}{{r_{1} - R_{i}}}\mspace{14mu}( {{where}\mspace{14mu}( {\hslash = {{h/2}\pi}} )} )}}}}}}} & (3) \\{( {{rs}❘{tu}} ) = {\int_{- \infty}^{\infty}{\int_{- \infty}^{\infty}{\int_{- \infty}^{\infty}{\int_{- \infty}^{\infty}{\int_{- \infty}^{\infty}{\int_{- \infty}^{\infty}{{x_{r}( r_{1} )}{x_{s}( r_{1} )}\frac{1}{{r_{1} - r_{2}}}{x_{t}( r_{2} )}{x_{u}( r_{2} )}{\mathbb{d}r_{1}}{\mathbb{d}r_{2}}}}}}}}}} & (4)\end{matrix}$Here, h indicates Planck's constant. Further, Nuc. stands for Nuclei andindicates the number of atoms. Z_(i) indicates the electrical charges ofthe nuclei and R_(i) indicates the positions of the nuclei. Therelationship between s molecular orbital φ_(a) and an atomic orbitalχ_(x) is shown as below.

$\begin{matrix}{\Phi_{a} = {\sum\limits_{r}^{N}{C_{ra}X_{r}}}} & (5)\end{matrix}$The integrations of the Equations (3) and (4) are transformed from theatomic orbital base to the molecular orbital base according to therelationship, as shown below.

$\begin{matrix}{h_{ab} = {\sum\limits_{r}^{N}{\sum\limits_{s}^{N}{c_{ra}c_{sb}h_{rs}}}}} & (6) \\{( {{ab}❘{c\; d}} ) = {\sum\limits_{r}^{N}{\sum\limits_{s}^{N}{\sum\limits_{t}^{N}{\sum\limits_{u}^{N}{c_{ra}c_{sb}c_{tc}{c_{ud}( {{rs}❘{tu}} )}}}}}}} & (7)\end{matrix}$Here, N indicates the number of atomic orbitals. Although the presentinvention is provided under the assumption that the number of the atomicorbitals is one thousand or more, this method can be used for a casewhere the number is less than that. c_(a) is a transformation matrixshown in Equation (5) for transforming the atomic orbital into themolecular orbital and referred to as a molecular-orbital coefficient.c_(sb) is another molecular-orbital coefficient. According to the MCSCFmethod, both the electron-configuration coefficient C and themolecular-orbital coefficient c are obtained by the variational method.However, the CI method is different from the above-described method inthat only the coefficient C is obtained.

The electron-configuration coefficient C is obtained through thefollowing equations.

$\begin{matrix}{{\sum\limits_{J}^{C\; S\; F}{( {H_{IJ} - {\delta_{IJ}E}} )C_{J}}} = 0} & (8) \\{H_{IJ} = {{\sum\limits_{ab}^{M\; O}{\gamma_{ab}^{IJ}h_{ab}}} + {\frac{1}{2}{\sum\limits_{abcd}^{M\; O}{\Gamma_{abcd}^{IJ}( {{ab}❘{c\; d}} )}}}}} & (9)\end{matrix}$

Here, CSF is a configuration state function. A wave function is given asthe linear combination of this asymmetric determinant CSF. δ_(IJ) isKronecker delta. Where an expression I=J stands, the value of δ_(IJ) isone. At all other times, the value thereof is zero.

A predetermined amount is required for determining the molecular-orbitalcoefficient, as shown below.

$\begin{matrix}{y_{ac} = {\sum\limits_{b}{\sum\limits_{d}{\sum\limits_{x}{\sum\limits_{y}{\{ {{( {{ab}❘{xy}} )\Gamma_{cdxy}} + {2( {{ax}❘{by}} )\Gamma_{cxdy}}} \} u_{bd}}}}}}} & (10)\end{matrix}$Here, U_(bd) is a matrix relating to linear transformation of themolecular orbital.

In either the MCSCF method or the CI method, generation of a 2-electronintegration (ab|cd) at the molecular-orbital base constitutes most partof the computing cost. According to the complete active space SCF(CASSCF) method, which is a typical method of the MCSCF method, electronexcitation is allowed only within a predetermined molecular-orbitalrange, so as to simplify the formula. Where the number of molecularorbitals in the active space is determined to be n and the number ofatomic-orbital bases therein is determined to be N, usually, therelationship between n and N is shown as n<<N. According to a knowncomputing scheme, all the 2-electron integrations at the atomic-orbitalbases are stored in a main memory or an external storage medium such asa disk, and transformation shown in Equation (7) is performed. Thetransformation algorithm is shown in FIGS. 5 and 6. In the case of asimple eight-deep DO loop including indexes a, b, c, and d of themolecular orbital and indexes r, s, t, and u of the atomic orbital, n⁴N⁴multiplication is required. However, according to the above-describedalgorithm, the same result as that of the eight-deep DO loop can beobtained by executing a five-deep DO loop four times, and the operand isshown as nN⁴+n²N³+n³N²+n⁴N. For example, where n=10 and N=1000, thecomputing speed increases by about one thousand times. However,according to this method, many computer resources are required forstoring the 2-electron integrations at the atomic-orbital bases andintermediate data midway through transformation. Therefore, this methodis not suitable for calculating large-sized molecules.

In recent years, computers have achieved high-speed operation by usingparallel processors. Therefore, according to either the MCSCF method orthe CI method, the size of a molecule to be calculated by parallelprocessing needs to be increased and the computing cost needs to bedecreased. The advantages of a parallel computer are shown below.

1. A high-speed operation computer can be achieved at low cost byconnecting many commodity processors.

2. Through the use of many local memories of the processors, thecomputer obtains a large main-memory area, as a whole.

The 2-electron integration at the atomic-orbital base and that at themolecular-orbital base are independent of each other. Therefore, wherethese integrations are parallelized based on this characteristic, thefollowing problems arise. That is to say, since all the 2-electronintegrations at N⁴ atomic-orbital bases are required for calculating one2-electron integration at the molecular-orbital base,

1. distribution processing for distributing part of the 2-electronintegrations at the atomic-orbital bases to the processors is performed,whereby all the N⁴ 2-electron integrations have to be collected by eachof the processors, which generates intercommunications between all theprocessors and become a bottleneck in communications, even though theintegration-computing time can be reduced by parallelization.

2. Since all the integrations at the N⁴ atomic-orbital bases need to becalculated in each of the processors for reducing the bottleneck incommunications, the reduction of computing time through theparallelization cannot be achieved.

Thus, the above-described problems are mutually contradictory to eachother.

Further, the computing performed according to Equation (2) and theintegration of the molecular-orbital base are required for obtaining thederivative of a 2-electron integration with respect to a nuclearcoordinate. In this case, problems same as the above-described problemsoccur.

DISCLOSURE OF INVENTION

The object of the present invention is to provide a parallel computingmethod for solving the above-described problems by using a parallelcomputer having a plurality of processors.

The parallel computing method according to the present invention willnow be described as below.

(1) A parallel computing method by using a parallel computer having aplurality of processors, wherein when a 2-electron integration istransformed from an atomic-orbital base (rs|tu) to a molecular-orbitalbase (ab|cd), indexes r and s of an atomic orbital to be calculated aredesignated for the plurality of processors and each of the plurality ofprocessors performs processing for all combinations of designatedindexes R and S, and indexes t and u of the atomic orbital to becalculated.

(2) A parallel computing method by using a parallel computer having aplurality of processors, wherein when a 2-electron integration istransformed from an atomic-orbital base (rs|tu) to a molecular-orbitalbase (ab|cd), indexes r and s of an atomic orbital to be calculated aredesignated for the plurality of processors and each of the plurality ofprocessors performs processing for all combinations of designatedindexes r and s, and indexes t and u of the atomic orbital to becalculated, whereby data to be processed is divided so that grain sizesindicating the workload for the plurality of processors are equalized.

(3) In a parallel computing method according to the processing (1) orthe processing (2),

data transfer is not performed between the plurality of processorsduring the 2-electron integration computing and gathering is performedonly once after the processing performed by the plurality of processors,so as to obtain the sum of data processed by the plurality ofprocessors.

(4) In a parallel computing method according to the processing (1) orthe processing (2),

each of the plurality of processors

1) calculates a 2-electron integration (RS|tu) at an atomic-orbital basefor all combinations of R and S designated for each of the processors,and t and u,

2) performs transformation from u to d for the calculated 2-electronintegration,

3) performs transformation from t to c for the data transformed throughthe processing 2),

4) performs transformation from S to B that is a component of b for thedata transformed through the processing 3),

5) performs transformation from R to A that is a component of a for thedata transformed through the processing 4),

6) stores the data transformed through the processing 5) in a localmemory of each of the processors, and

7) finally performs gathering, thereby obtaining the 2-electronintegration (ab|cd) at the molecular-orbital base.

(5) In a parallel computing method according to the processing (1) orthe processing (2),

each of the plurality of processors

1) calculates a 2-electron integration (RS|tu) at an atomic-orbital basefor all combinations of R and S designated for each of the processors,and t and u,

2) performs transformation from u to d for the calculated 2-electronintegration,

3) performs transformation from t to c for the data transformed throughthe processing 2),

4) performs transformation from S to B that is a component of b for thedata transformed through the processing 3),

5) does not transform R but stores in a local memory of each of theplurality of processors, as V_(cbd(RS)), and

6) calculates the product sum of a transformation coefficient from R toa, a transformation coefficient between molecular orbitals, and theV_(cbd(RS)), and stores the computing result in a local memory of eachof the plurality of processors in repetition computings for Y_(ac)indicating an amount required for determining a molecular-orbitalcoefficient.

(6) A parallel computing method by using a parallel computer having aplurality of processors, wherein a secondary density matrix Γ at amolecular-orbital base is transferred to each of the plurality ofprocessors and inversely transformed from the molecular-orbital base toan atomic-orbital base in each of the plurality of processors, wherebyindexes r and s of an atomic orbital to be calculated are designated forthe plurality of processors, processing is performed for allcombinations of the designated indexes R and S, and indexes t and u ofthe atomic orbital to be calculated, and data to be processed is dividedso that grain sizes indicating a workload for the plurality ofprocessors are equalized, where a derivative of a 2-electron integrationwith respect to an atomic-nuclear coordinate is calculated.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates the first-half of distribution processing performedbetween processors, where the distribution processing is performed for2-electron-integration transformation according to the presentinvention.

FIG. 2 illustrates the latter-half of the distribution processingperformed between the processors, where the distribution processing isperformed for the 2-electron-integration transformation according to thepresent invention.

FIG. 3 illustrates distribution processing performed between theprocessors, where the distribution processing is performed formolecular-orbital-coefficient determination repetition computingaccording to the present invention.

FIG. 4 illustrates a parallel algorithm obtained through inversetransformation of a secondary density matrix Γ from themolecular-orbital base to the atomic-orbital base according to thepresent invention.

FIG. 5 illustrates steps 1 to 3 of 4N⁵ transformation algorithm used fortransforming a 2-electron integration from an atomic-orbital base to amolecular-orbital base.

FIG. 6 illustrates step 4 of the 4N⁵ transformation algorithm used fortransforming the 2-electron integration from the atomic-orbital base tothe molecular-orbital base.

BEST MODE FOR CARRYING OUT THE INVENTION

Modes of the present invention will now be described.

According to the inventors of the present invention, since a 2-electronintegration at an atomic-orbital base and a derivative with respect toan atomic-nuclear coordinate are independent of each other,parallelization by using indexes r and s of an atomic orbital isrequired for transformation from the atomic-orbital base to amolecular-orbital base. (That is to say, the indexes r and s of anatomic orbital to be calculated are designated for the plurality ofprocessors and all the combinations of the designated indexes r and s,and indexes t and u are calculated in each of the processors, so thatdata to be processed is divided for obtaining equalized grain sizes.Parallelization is performed by the indexes r and s, and computing forall the combinations of these indexes and the indexes t and u isperformed in each of the processors. Therefore, floating-pointarithmetic needs to be performed on an average of ten thousand times forone 2-electron integration. Subsequently, each of the processors has toperform 10000N² computings, whereby sufficiently large grain sizes canbe obtained. The grain sizes need to be equalized and maximized, and thenumber of transfer between the processors needs to be decreased forincreasing parallel-computing performance. This configuration will beillustrated later. According to computing procedures of the presentinvention, data transfer is not performed between the processors duringthe 2-electron-integration computing and gathering is performed onceafter the transformation.

According to a solution for obtaining a coefficient C of electronconfiguration, parallelization is performed by the indexes r and s ofthe atomic orbital. The computing procedures in each of the processorsare described below.

1) A 2-electron integration (RS|tu) at the atomic-orbital base iscalculated for all the combinations of R and S designated for eachprocessor, and t and u.

2) Transformation from u to d is performed for the calculated 2-electronintegration.

3) Transformation from t to c is performed.

4) Transformation from S to B is performed.

5) Transformation from R to A is performed.

The transformation data is stored in a local memory of each of theprocessors. Finally, gathering is performed for obtaining a 2-electronintegration (ab|cd) at the molecular-orbital base.

According to a repetition solution for obtaining a coefficient c of themolecular orbital, parallelization is performed by the indexes r and sof the atomic orbital. The computing procedures performed in each of theprocessors are described below.

1) A 2-electron integration (RS|tu) at the atomic-orbital base iscalculated for all the combinations of R and S designated for eachprocessor, and t and u.

2) Transformation from u to d is performed for the calculated 2-electronintegration.

3) Transformation from t to c is performed.

4) Transformation from S to B is performed.

5) R is not transformed but stored in the local memory, as V_(cbd(RS)).

6) The product sum of a transformation coefficient (C_(Ra)) from R to aand a transformation coefficient (u_(bd)) between the molecular orbitalsis calculated concurrent with repetition computing of Y_(ac) (seeEquation (13). Since the local memory of each processor storesV_(cbd(RS)) having a little amount of data, a work area on the mainmemory is prevented from being increased due to the repetitioncomputing.

In the energy-gradient computing, the secondary-density matrix Γ at themolecular-orbital base is transferred to each of the processors andinverse transformation from the molecular-orbital base to theatomic-orbital base is performed. Subsequently, parallelization isperformed by the indexes r and s of the atomic orbital, whereby theconfiguration becomes the same as in the case of the2-electron-integration computing. In this case, the derivative of onlyan atomic orbital r with respect to a nuclear coordinate needs to beobtained.

Embodiments of the present invention will now be described.

According to solutions for obtaining the electron-configurationcoefficient C and the molecular-orbital coefficient c according to theMCSCF method and the electron-configuration coefficient C in the CImethod, a repetition solution is widely used for reducing the work areaon a main memory. Therefore, intermediate data such as the 2-electronintegration at the molecular-orbital base needs to be stored. Theparallel computer can store the intermediate data due to theabove-described advantage given under 2. (Through the use of many localmemories of the processors, the computer obtains a large main memoryarea, as a whole.). On the other hand, since the repetition computing isunnecessary for the energy-gradient computing, the present inventionachieves a parallel algorithm based on basic ideas given under 1) and2).

1) For determining the electron-configuration coefficient C and themolecular-orbital coefficient c, where the relationship between n and Nis shown as n<<N, intermediate data such as the 2-electron integrationat the molecular-orbital base, where the amount of the 2-electronintegration is little, is stored on the main memory on each of theprocessors and used repetitively.

2) In the energy-gradient computing, γ and Γ of the molecular-orbitalbase are inversely transformed to the atomic-orbital base and subjectedto parallel computing based on an equation for the atomic-orbital base,as shown below.

$\begin{matrix}{\frac{\partial E}{\partial q} = {{\sum\limits_{rs}^{A\; O}{\gamma_{rs}\frac{\partial h_{rs}}{\partial q}}} + {\sum\limits_{rstu}^{A\; O}{\Gamma_{rstu}\frac{\partial( {{rs}❘{tu}} )}{\partial q}}} - {\sum\limits_{rs}^{A\; O}{W_{rs}\frac{\partial S_{rs}}{\partial q}}}}} & (11)\end{matrix}$

Here, AO stands for an atomic orbital. W indicates an amount used forthe energy-gradient method. This amount is obtained by calculating theproduct of orbital energy and the molecular-orbital coefficient.Further, S_(rs) indicates an overlap integration.

First, a method for generating a 2-electron integration at amolecular-orbital base is described, where the molecular-orbital basebelongs to an active space required for determining theelectron-configuration coefficient C. Parallelization is performed bythe indexes r and s of the atomic orbital. Further, a 2-electronintegration (RS|tu) at the atomic-orbital base is calculated for all tand u in each of the processors. The basic algorithm is shown in FIGS. 1and 2. Here, R and S of the indexes of the atomic orbital, where theindexes are designated as r and s, indicate an atomic orbital allocatedto a predetermined processor. An equation is provided as below.

$\begin{matrix}{( {{ab}❘{c\; d}} ) = {{\sum\limits_{A}{\sum\limits_{B}( {{c_{RA}{Rc}_{SB}S}❘{c\; d}} )}} = {\sum\limits_{A}{\sum\limits_{B}( {{AB}❘{c\; d}} )}}}} & (12)\end{matrix}$Here, according to the relationship shown in Equation (5), A and Binclude only R and S designated for each processor, where the R and S isincluded in linear-combination components of atomic orbitals r and swith reference to molecular orbitals a and b. Finally, (AB|cd) isgathered, whereby correct (ab|cd) is obtained. According to thisrelationship, the computing procedures performed in each of theprocessors are provided as below.

1) A 2-electron integration (RS|tu) at the atomic-orbital base iscalculated for all the combinations of R and S designated for eachprocessor, and t and u.

2) Transformation from u to d is performed for the calculated 2-electronintegration.

3) Transformation from t to c is performed.

4) Transformation from S to B is performed.

5) Transformation from R to A is performed.

It should be noted that no communications between the processors aregenerated during the steps 1) to 5) are performed and that the number ofthe 2-electron integrations at the atomic-orbital bases to be calculatedin all the processors becomes N⁴. Finally, (AB|cd) on each of theprocessors is gathered, so as to obtain (ab|cd). The transfer amount isn⁴ at most. Since n is the order of 10 and the transfer is performed inone direction to a master processor, the transfer time presents noproblem. Further, where the number of the processors is small, R and Smay be grouped, as required, for achieving parallelization in a likemanner. The number of entire computings according to known methods isnN⁴+n²N³+n³N²+n⁴N, as described above. According to this method,however, since computings at steps 1 to 4 are performed in N²processors, the number of entire computings is N²(nN²+n²N+n³+n⁴)+n⁴N².Thus, the number of computings according to this method increases byn⁴N(N−1)+n⁴N² through parallelization.

Next, parallelization of Equation (10) necessary for determining themolecular-orbital coefficient c will be described. Part of Equation (10)is now shown as the atomic-orbital base.

$\begin{matrix}\begin{matrix}{y_{ac} = {\sum\limits_{b}{\sum\limits_{d}{\sum\limits_{x}{\sum\limits_{y}{\{ {{( {{ab}❘{xy}} )\Gamma_{cxdy}} + {2( {{ax}❘{by}} )\Gamma_{cxdy}}} \} u_{bd}}}}}}} \\{= {\sum\limits_{b}{\sum\limits_{d}{\sum\limits_{x}{\sum\limits_{y}\{ {{\sum\limits_{R}{\sum\limits_{S}{( {{c_{Ra}{Rc}_{Sb}S}❘{xy}} )\Gamma_{cdxy}}}} +} }}}}} \\{ {2{\sum\limits_{R}{\sum\limits_{S}{( {{c_{Ra}{Rc}_{Sx}S}❘{by}} )\Gamma_{cxdy}}}}} \} u_{bd}} \\{= {\sum\limits_{R}{\sum\limits_{S}{\sum\limits_{b}{\sum\limits_{d}{\sum\limits_{x}{\sum\limits_{y}\{ {{( {{c_{Ra}{Rc}_{Sb}S}❘{xy}} )\Gamma_{cdxy}} +} }}}}}}} \\{ {2( {{c_{Ra}{Rc}_{Sx}S}❘{by}} )\Gamma_{cxdy}} \} u_{bd}} \\{= {\sum\limits_{R}{\sum\limits_{S}{c_{Ra}{\sum\limits_{b}{\sum\limits_{d}{\sum\limits_{x}{\sum\limits_{y}\{ {{( {{{Rc}_{Sb}S}❘{xy}} )\Gamma_{cdxy}} +} }}}}}}}} \\{ {2( {{{Rc}_{Sx}S}❘{by}} )\Gamma_{cxdy}} \} u_{bd}} \\{= {\sum\limits_{R}{\sum\limits_{S}{c_{Ra}{\sum\limits_{b}{\sum\limits_{d}V_{{{cbd}{({RS})}}^{u}{bd}}}}}}}} \\{= {\sum\limits_{R}{\sum\limits_{S}Y_{{ac}{({RS})}}}}}\end{matrix} & (13)\end{matrix}$

Further, according to Equation (12), Y_(ac(RS)) can also be gathered,where it is determined to be a part calculated in each of the processors(CPU: Central Processing Unit). It should be noted that the value oftransfer-data amount is N². Subsequently, y_(ac) can be obtained throughperforming parallelization by r and s and gathering Y_(ac(RS))calculated in each of the processors. The data flow is shown in FIG. 3.The reason why data is stored, as V_(cbd(RS)), in each of the processorsis shown below. Since a and b in Equation (10) become all the molecularorbitals in principle, the dimension number thereof is N, where thedimension number of x and y is n. Subsequently, the size of the firstand second terms becomes n²N². Since r is stored without beingtransformed, the number of computings increases. However, the size ofthe work area is reduced to n²N. Further, where (ab|xy) is gathered, thedata amount becomes n²N². However, the amount is cut to N² bycalculating the sum of x and y in each of the processors before thetransfer, whereby the transfer data amount reduces.

Finally, parallelization of 2-electron integrations by an atomic-nuclearcoordinate will now be described. This parallelization is performed forthe energy-gradient computing. As described above, this computing methoddoes not require repetition computings. Therefore, the parallelizationis achieved by transferring Γ whose data amount is as small as n⁴ toeach of the processors and inversely transforming it from themolecular-orbital base to the atomic-orbital base. Thisinverse-transformation algorithm is shown in FIG. 4. The parallelizationis performed by using r and s and computings are performed for all t andu, as in the case of an ordinary 2-electron integration, except that thedepth number of the DO loop is seven. The second term of Equation (11)calculated by each of the processors is added to an array keeping aforce applied to an atomic nucleus on the processor, and gathered. Thesize of the array is 3× the atom number and does not become a bottleneckin transfer. The number of computings for the inverse transformation ofΓ is the same as that of the known methods.

According to the above-described parallel algorithm, the parallelizationis performed by the indexes r and s of the atomic orbital. Therefore,the grain sizes are equalized, as the number of processors increases,whereby high scalability is maintained. Further, since the suitablenumber of processors is N², this method can be used for one millionparallel computers, where N=1000.

The following Tables 1 and 2 illustrate the known computing method, thecomputing number of this method, and expressions of computings performedin the work area. Although the computing number increases a little dueto the parallelization, the size of the work area in each of theprocessors decreases by a factor of N². Subsequently, this algorithm caneffectively use inexpensive local memories.

TABLE 1 Comparison between computing amount and work area for 2-electron integration transformation according to known transformationmethod, and those according to transformation method of the presentinvention N⁵ multiplication method This method (Note 1) Computing WorkComputing Work Amount Area Amount Area STEP 1 N⁴n N⁴ + N³n N²n N² + NnSTEP 2 N³n² N³n + N²n² Nn² Nn + n² STEP 3 N²n³ N²n² + Nn³ n³ n² + n³STEP 4 Nn⁴ Nn³ + n⁴ n⁴ n³ + n⁴ Gathering — — N²n⁴ Note 1) where N²processors are used

TABLE 2 Comparison between computing amount and work area formolecular-orbital coefficient determination according to knowntransformation method, and those according to transformation method ofthe present invention N⁵ multiplication method This method (Note 1)Computing Work Computing Work Amount Area Amount Area STEP 1 N⁴n N⁴ +N³n N²n N² + Nn STEP 2 N³n² N³n + N²n² Nn² Nn + n² STEP 3 N³n² 2(N²n²)Nn² n² + Nn² STEP 4 N³n² 2(N²n²) N²n² Nn² + Nn³ Gathering — — N² Note 1)where N² processors are used

As has been described, the present invention provides a parallelcomputing method using a parallel computer having a plurality ofprocessors. According to this method, where a 2-electron integration istransformed from an atomic-orbital base (rs|tu) to a molecular-orbitalbase (ab|cd), indexes r and s of an atomic orbital to be calculated aredesignated for the plurality of processors. Further, each of theplurality of processors performs processing for all the combinations ofdesignated indexes R and S, and indexes t and u of the atomic orbital tobe calculated. Therefore, equalized grain sizes are obtained and manycommodity processors can be connected to the parallel computer, withhigh performance. Therefore, the cost of the high-speed operationcomputer reduces. Further, since the computer can use inexpensive localmemories, it can secure a large main-memory area, as a whole.

1. A parallel computing method for transforming a 2-electron integrationfrom an atomic-orbital base (rs|tu) to a molecular-orbital base (ab|cd)using a parallel computer having a plurality of processors, the methodreducing an amount of computations by calculating at least three of thefour indexes among (ab|cd) at the molecular orbital base, the methodcomprising: designating indexes r and s for the plurality of processorsto calculate 2-electron integration (RS|tu) at the atomic orbital base,wherein designated indexes R and S indicate an atomic orbital allocatedto a predetermined processor from the plurality of processors;performing processing for all combinations of designated indexes R and Sand indexes t and u of the atomic orbital in parallel by the pluralityof processors to calculate the at least three of the four indexes among(ab|cd) at the molecular orbital base; and storing the at least three ofthe four calculated indexes among (ab|cd) at the molecular orbital basein a memory.
 2. The parallel computing method according to claim 1,wherein data transfer is not performed between the plurality ofprocessors during the 2-electron integration computing and whereingathering is performed only once after the processing performed by theplurality of processors, so as to obtain the sum of data processed bythe plurality of processors.
 3. The parallel computing method accordingto claim 1, wherein each of the plurality of processors (1) calculatesthe 2-electron integration (RS|tu) at the atomic-orbital base for allcombinations of R and S designated for each of the processors, and t andu, (2) performs transformation from u to d for the calculated 2-electronintegration, (3) performs transformation from t to c for the datatransformed through the processing (2), (4) performs transformation fromS to B that is a component of b for the data transformed through theprocessing (3), (5) performs transformation from R to A that is acomponent of a for the data transformed through the processing (4), (6)stores the data transformed through the processing (5) in a local memoryof each of the processors, and (7) finally performs gathering, therebyobtaining the 2-electron integration (ab|cd) at the molecular-orbitalbase.
 4. The parallel computing method according to claim 1, whereineach of the plurality of processors (1) calculates the 2-electronintegration (RS|tu) at the atomic-orbital base for all the combinationsof R and S designated for each of the processors, and t and u, (2)performs transformation from u to d for the calculated 2-electronintegration, (3) performs transformation from t to c for the datatransformed through the processing (2), (4) performs transformation fromS to B that is a component of b for the data transformed through theprocessing (3), (5) does not transform R but stores in a local memory ofeach of the plurality of processors, as V_(cbd(RS)), and (6) calculatesthe product sum of a transformation coefficient from R to a, atransformation coefficient between molecular orbitals, and theV_(cbd(RS)), and stores the computing result that is the calculatedproduct sum in a local memory of each of the plurality of processors inrepetition computings for Y_(ac) indicating an amount required fordetermining a molecular-orbital coefficient.
 5. The parallel computingmethod according to claim 3, wherein designated indexes R and S areincluded in linear combination components of atomic-orbitals r and swith reference to molecular-orbitals a and b.
 6. A parallel computingmethod for transforming a 2-electron integration from an atomic-orbitalbase (rs|tu) to a molecular-orbital base (ab|cd) using a parallelcomputer having a plurality of processors, the method reducing an amountof computations by calculating at least three of the four indexes among(ab|cd) at the molecular orbital base, the method comprising: dividingdata to be processed so that grain sizes indicating the workload for theplurality of processors are equalized; designating indexes r and s forthe plurality of processors to calculate 2-electron integration (RS|tu)at the atomic orbital base, wherein designated indexes R and S indicatean atomic orbital allocated to a predetermined processor from theplurality of processors; performing processing for all combinations ofthe designated indexes R and S and indexes t and u of the atomic orbitalin parallel by the plurality of processors to calculate the at leastthree of the four indexes among (ab|cd) at the molecular orbital base;and storing the at least three of the four calculated indexes among(ab|cd) at the molecular orbital base in a memory.
 7. The parallelcomputing method according to claim 6, wherein data transfer is notperformed between the plurality of processors during the 2-electronintegration computing and wherein gathering is performed only once afterthe processing performed by the plurality of processors, so as to obtainthe sum of data processed by the plurality of processors.
 8. Theparallel computing method according to claim 6, wherein each of theplurality of processors (1) calculates the 2-electron integration(RS|tu) at the atomic-orbital base for all the combinations of R and Sdesignated for each of the processors, and t and u, (2) performstransformation from u to d for the calculated 2-electron integration,(3) performs transformation from t to c for the data transformed throughthe processing (2), (4) performs transformation from S to B that is acomponent of b for the data transformed through the processing (3), (5)performs transformation from R to A that is a component of a for thedata transformed through the processing (4), (6) stores the datatransformed through the processing (5) in a local memory of each of theprocessors, and (7) finally performs gathering, thereby obtaining the2-electron integration (ab|cd) at the molecular-orbital base.
 9. Theparallel computing method according to claim 6, wherein each of theplurality of processors (1) calculates the 2-electron integration(RS|tu) at the atomic-orbital base for all combinations of R and Sdesignated for each of the processors, and t and u, (2) performstransformation from u to d for the calculated 2-electron integration,(3) performs transformation from t to c for the data transformed throughthe processing (2), (4) performs transformation from S to B that is acomponent of b for the data transformed through the processing (3), (5)does not transform R but stores in a local memory of each of theplurality of processors, as Vcbd(RS), and (6) calculates the product sumof a transformation coefficient from R to a, a transformationcoefficient between molecular orbitals, and the Vcbd(RS), and stores thecomputing result that is the calculated product sum in a local memory ofeach of the plurality of processors in repetition computings for Y_(ac)indicating an amount required for determining a molecular-orbitalcoefficient.
 10. The parallel computing method according to claim 8,wherein designated indexes R and S are included in linear combinationcomponents of atomic-orbitals r and s with reference tomolecular-orbitals a and b.
 11. A parallel computing method using aparallel computer having a plurality of processors, wherein a secondarydensity matrix Γ (abcd) at a molecular-orbital base is inverselytransformed from the molecular-orbital base to an atomic-orbital base,the atomic orbital base having four indexes (RStu), and calculating aderivative of a 2-electron integration at the atomic orbital base withrespect to an atomic-nuclear coordinate, the method reducing an amountof computations by calculating all the four indexes (RStu) at the atomicorbital base in parallel, the method comprising: dividing data to beprocessed so that grain sizes indicating a workload for the plurality ofprocessors are equalized; transferring the secondary density matrix Γ toeach of the plurality of processors; performing parallel processing tocompute the inverse transformation of the secondary density matrix Γ(abcd) from the molecular-orbital base to the atomic-orbital base toobtain secondary density matrix Γ (RStu) at the atomic orbital base,wherein all the four indexes (RStu) at the atomic orbital base arecalculated in parallel by the each of the plurality of processors;designating indexes R and S of the atomic orbital indexes computed abovefor the plurality of processors to calculate a 2-electron integration(RS|tu) at the atomic orbital base, performing processing for allcombinations of the designated indexes R and S and indexes t and u ofthe atomic orbital to calculate the 2-electron integration (RS|tu) atthe atomic orbital base; calculating the derivative of the 2-electronintegration (RS|tu) with respect to the atomic-nuclear coordinate; andstoring said calculated derivative of the 2-electron integration (RS|tu)with respect to the atomic nuclear coordinate in a memory.